Generative Acoustic-Phonemic-Speaker Model Based on Three-Way Restricted Boltzmann Machine

نویسندگان

  • Toru Nakashika
  • Yasuhiro Minami
چکیده

In this paper, we argue the way of modeling speech signals based on three-way restricted Boltzmann machine (3WRBM) for separating phonetic-related information and speaker-related information from an observed signal automatically. The proposed model is an energy-based probabilistic model that includes three-way potentials of three variables: acoustic features, latent phonetic features, and speaker-identity features. We train the model so that it automatically captures the undirected relationships among the three variables. Once the model is trained, it can be applied to many tasks in speech signal processing. For example, given a speech signal, estimating speaker-identity features is equivalent to speaker recognition; on the other hand, estimated latent phonetic features may be helpful for speech recognition because they contain more phonetic-related information than the acoustic features. Since the model is generative, we can also apply it to voice conversion; i.e., we just estimate acoustic features from the phonetic features that were estimated given the source speakers acoustic features along with the desired speaker-identity features. In our experiments, we discuss the effectiveness of the speech modeling through a speaker recognition, a speech (continuous phone) recognition, and a voice conversion tasks.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Voice Conversion Based on Speaker-Dependent Restricted Boltzmann Machines

This paper presents a voice conversion technique using speaker-dependent Restricted Boltzmann Machines (RBM) to build highorder eigen spaces of source/target speakers, where it is easier to convert the source speech to the target speech than in the traditional cepstrum space. We build a deep conversion architecture that concatenates the two speakerdependent RBMs with neural networks, expecting ...

متن کامل

Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine

Straightforward application of Deep Belief Nets (DBNs) to acoustic modeling produces a rich distributed representation of speech data that is useful for recognition and yields impressive results on the speaker-independent TIMIT phone recognition task. However, the first-layer Gaussian-Bernoulli Restricted Boltzmann Machine (GRBM) has an important limitation, shared with mixtures of diagonalcova...

متن کامل

A Hybrid Algorithm based on Deep Learning and Restricted Boltzmann Machine for Car Semantic Segmentation from Unmanned Aerial Vehicles (UAVs)-based Thermal Infrared Images

Nowadays, ground vehicle monitoring (GVM) is one of the areas of application in the intelligent traffic control system using image processing methods. In this context, the use of unmanned aerial vehicles based on thermal infrared (UAV-TIR) images is one of the optimal options for GVM due to the suitable spatial resolution, cost-effective and low volume of images. The methods that have been prop...

متن کامل

First attempt of boltzmann machines for speaker verification

Frequently organized by NIST, Speaker Recognition evaluations (SRE) show high accuracy rates. This demonstrates that this field of research is mature. The latest progresses came from the proposition of low dimensional i-vectors representation and new classifiers such as Probabilistic Linear Discriminant Analysis (PLDA) or Cosine Distance classifier. In this paper, we study some variants of Bolt...

متن کامل

Tandem deep features for text-dependent speaker verification

Although deep learning has been successfully used in acoustic modeling of speech recognition, it has not been thoroughly investigated and widely accepted for speaker verification. This paper describes an investigation of using various types of deep features in a Tandem fashion for text-dependent speaker verification. Three types of networks are used to extract deep features: restricted Boltzman...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016